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Abstract. Recent empirical work [35J has suggested the existence of a size threshold for the existence 
of clusters within many real-world networks. We give the first proof that this clustering size threshold 
exists within a real-world random network model, and determine the asymtotic value at which it occurs. 

More precisely, we choose the Community Guided Attachment (CGA) random network model of 
Leskovek, Kleinberg, and Faloutsos [34|. The model is non-uniform and contains self-similar communities, 
and has been shown to have many properties of real-world networks. To capture the notion of clustering, 
we follow Mishra et. al. |42J, who defined a type of clustering for real-world networks: an (a, fi)-cluster 
is a set that is both internally dense (to the extent given by the parameter /3), and externally sparse (to 
the extent given by the parameter a) . With this definition of clustering, we show the existence of a size 
threshold of (Inn) 5 for the existence of clusters in the CGA model. For all e > 0, a.a.s. clusters larger 
than (lnn) 5_e exist, whereas a.a.s. clusters larger than (lnn) 5+e do not exist. Moreover, we show a 
size bound on the existence of small, constant-size clusters. 
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1. Introduction 

Real-world networks are everywhere. Examples include the network formed by the connections between 
people in a city; the network of citations between academic papers; an electric power grid; and the network 
of physical interactions between proteins [44]. Despite their differing origins, emperical observation has 
shown these networks to share many properties. These include: the small-world effect (the average 
shortest path between two nodes in a real-world network is smaller than one might expect, and may 
shrink over time) (52l [30j ; the scale-free property (the degree distribution of nodes in the network follows 
a power law) jl0j[21]; and clustering (certain parts of the network are much more closely connected than 
their surrounding neighbourhood) [52, 11]. The present work concerns this last property of clustering. 
Clustering is greatly important in biology, sociology, and computer science (see e.g. [31 l2l l27l l43l l46l 
@5j 071 [48] ; [23] lists hundreds of others), and plays an important role in understanding the structure of 
real-world networks [52l [TTJ [23] [24]. Despite this, our work is the first we know of to study clustering 
analytically in any random model for real-world networks. 

Clusters in real-world networks often overlap [42, 3]; that is, a single node may be a member of more 
than one cluster. One can imagine that a computer in a computer network may belong to multiple groups 
(corresponding to clusters); similarly, a person in a social network may have multiple groups of friends. 
However, most approaches to clustering partition the network without allowing for clusters to overlap. In 
2007, Mishra, Schreiber, Stanton, and Tarjan [42] proposed the {pt, P) -cluster as a new formulation of 
clustering. In this definition, a cluster is a set that is both externally sparse (each node outside the set is 
connected to only a few nodes outside the set, as determined by the parameter a) and internally dense 
(each node in the set is connected to many others inside the set, as determined by the parameter 0). 
This definition, motivated by real-world networks such as social networks, allows for overlapping clusters. 

A measure related to clustering is that of conductance. The conductance of a set is the ratio between 
the number of "cut edges" (the edges between the set and its complement) and the number of internal 
edges in that set. A set with low conductance has many internal edges and few "cut" edges, and is 
therefore intuitively a good cluster. Recently, Leskovek, Lang, Dasgupta, and Mahoney [35] empirically 
examined how the conductance of the best conductance clusters changed as cluster size increased. They 
found similar behaviour in many existing real-world networks: below a certain size threshold, good clusters 
exist; moreover, increasing cluster size below the threshold improves the quality of the best cluster. Above 
the threshold, however, increasing the cluster size decreases the quality of the best cluster. 

In other words, there appears to exist a size threshold for conductance clusters in real-world networks. 
In this work, we give the first proof of a clustering size threshold in any model. More precisely, we choose 
to work with (a, /3)-clusters. Because (a, /3)-clusters were created specifically for types of real-world 
networks, they are a natural formulation to choose. Instead of an empirical approach, our method will 
be to study clustering analytically within an existing model of a real-world random network. The benefits 
of this approach are two-fold: first, it allows us to consider the asymptotic behaviour of clusters as the 
network size grows; and second, it allows us to prove directly the existence of a threshold. 

Many random graph models that contain properties of real-world networks have been proposed (e.g. 
[521 LEI [5j [301 [HI L32]). In 2005, Leskovek et. al. [34] observed that real-world networks obey the 
additional property of densification (the average degree of a node will grow over time), and proposed a 
model called Community Generated Attachment (CGA). Because this model is built from a self-similar 
structure of nested communities, it exhibits non-trivial clustering in a way that previous models do not. 
It was also the first model to exhibit densification in addition to being scale-free. Furthermore, its simple 
mathematical description makes it amenable to analysis; other models that exhibit densification have not 
tended to permit analysis [32]. For these reasons, we choose to work with the CGA model, which will be 
defined fully in Section [2l 

The goal of the present work is to analyze conditions under which (a, /3)-clusters occur in the CGA 
model for real-world networks. For every fixed < a, /3 < 1, we establish a cluster size threshold of 
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(Inn) 2 (where n is the size of the network): for all e > 0, a.a.sQ there are clusters larger than (Inn) 2 e , 

while a.a.s. there are no clusters larger than (lnn) 2+e . Furthermore, we show a size bound on the 
existence of small, constant-size clusters. 

Our work is is the first instance we know of that studies the existence of any notion of clustering 
analytically in a random model for real-world networks. 

2. Model and Definitions 

2.1. CGA Random Graph Model. The community-guided attachment (CGA) random graph model 
was first proposed by Leskovek, Kleinberg, and Faloutsos [34], in response to the observation that real- 
world random graphs tend to have average node degrees that increase over time, a property known as 
densification. It was the first model proposed with this property. Several other models with densification 
exist: Leskovek et. al. also proposed the "forest fire" model [34]; the model in [36J can be shown to have 
densification and other real-world network properties. Lattanzi and Sivakumar's recent affiliation network 
model [32] is an example of a densifying model that can admit analysis. 

The CGA model is based on levels of nested, self-similar communities. This is natural, because 
real-world networks often exhibit some level of self-similarity. For example, a computer network may 
be decomposed into several sub-networks, based on geography or purpose. Likewise, each of these 
sub-networks may themselves be further decomposed into smaller groups. A pair of computers sharing 
membership in one of these small groups are much more likely to be connected than two computers 
chosen at random from within the large network. This self-similar hierarchical structure is also observable 
in other domains. For example, it has been argued to apply to social groupings [53J, subject classification 
of patents [34], and topic classification of Web pages [39J. The CGA model itself has previously been 
used to describe peer-to-peer networks [18J. 

The nested form of CGA makes it a natural choice of model to study clustering. Random graph models 
using fixed power-law degree sequences (e.g. [401 E]) are a.a.s. locally tree-like, and hence exhibit no 
clustering at all. Methods such as Watts and Strogatz' small-world model [52J start with a regular local 
graph structure such as a cycle or grid, and then add or re-wire some number of edges at random to insert 
long-range edges. While these models exhibit more clustering than a uniform random graph, it is trivial 
clustering: clusters are determined by the original, deterministic local structure rather than the random 
edges. In CGA, the nested communities mean that small, more dense communities are more likely to 
contribute to clusters than larger, less dense ones. As we will show, this implies the existence of varied 
clusters of different sizes, formed by random edges. 

We now define the model precisely. 

Definition. Let T be a complete tree of height H, with constant fan-out b (that is, each non-leaf node 
has exactly b children). Let n = b H be the number of leaves of T. We will construct a random un-directed 
graph G = (V, E) whose nodes V are the leaves of T. Given two nodes u, v G V, we define the height 
h(u, v) to be the height of the smallest subtree in T that contains both u and v. (In other words, h(u, v) 
is one half of the distance between u and v in T.) For a parameter c > 1, the probability that our random 
graph G will have an edge from u to v will be equal to c ~ h ( u < v ) . 

The edge probability function chosen here is the only natural choice. To see this, set Pr ((u, v) G E) = 
f (h (it, v)) for some function /. For G to have a power-law degree sequence, we require that / (h) / f (h — 1) 
is constant. Hence, we must have / (h) = ^yc~ h , where c > 1 is the shrinking parameter, and 7 < 1 
indicates the initial density. For simplicity, we will take 7 = 1, but the work that follows can be adapted 
for any value of 7 < 1. 



We say an event occurs asymptotically almost surely (a.a.s.) if the probability the event occurs approaches one as 

n 00. 
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Intuitively, each internal node of T defines a sub-community of G, given by the leaf nodes of the 
sub-tree of T rooted at that internal node. The larger the sub-tree, the larger and less connected (on 
average) the sub-community. 

We now define some terminology used within the CGA model: 

Definition. Let M C V(G), so that M corresponds to a subset of the leaves of the tree T. Define 
the height of M to be the height of the minimum complete subtree in T containing all of M. If a set 
M has height h, then we will call M complete if it has b h nodes. For each h! > h, M is a subset of 
exactly one complete set of height h!\ we will denote this set by S(M,h') . S(M,h) is the minimum 
complete set containing M, and will usually be denoted S (M). If u is a leaf node of T disjoint from 
S{M), notice that there exists a j > h such that for all v € S(M), h(u,v) = j. Therefore, we will 
define h (it, M) = j and refer to it as the height of u from M. 

Intuitively, the height of a pair of (leaf) nodes gives a prediction of the similarity between those two 
nodes. Notice that the height of a set is the maximum height over all pairs of vertices in the set. 

The above definitions are for the undirected version of the CGA model. The directed CGA model 
is only a minor modification from this: instead of a single edge between two vertices u and v, we will 
have two edges going in opposite directions, which occur indendently with equal probability c~ h ( u ' v >. 
This version was the one originally proposed by Leskovek et. al. [34]. In this paper, we work with the 
undirected version, but all conclusions hold in both forms of the model with only trivial modifications. 

2.2. (a, /3)-clusters. Intuitively, a cluster is a set of vertices that is more edge-dense than the graph 
average. However, the exact definition chosen often varies with application. For the real-world networks 
we consider, allowing overlapping clusters is natural [421 [3]. In other words, a single vertex should be 
allowed membership in more than one cluster. Furthermore, it may be possible that a vertex is not a 
member of any clusters at all. In 2007, Mishra et. al. [42J proposed a clustering definition for social 
networks (and applying to other real-world graph applications) that allows clusters to overlap. This 
definition is as follows: 

Definition. For an undirected graph G, let v £ V (G) and M C V(G), and let e(v,M) denote the 
number of edges between v and M. For parameters < a, /3 < 1, we say M is internally dense if for all 
v £ M, e (v, M)> P \M\, and is externally sparse if for all u £ M, e (it, M) < a \M\. An (a, f3)-cluster 
is a set that is both internally dense and externally sparse. 

We will often refer to a (a, /3)-cluster as just a cluster, with the parameters a and j3 being implicit. 
Note that for a cluster to necessarily be connected, we would have to require /3 > |. For this reason, 
Mishra et. al. restrict j3 to this range, but nothing in this work places any restriction on f5. As well, it is 
natural to have a < (3, but we do not require this either. 

Within the CGA model, clusters may be described with the same terminology as vertex sets: 

Definition. A cluster of height h is a complete cluster if it has b h vertices. 

By using outgoing edges, the definition of (a, /3)-cluster carries over to directed networks: 

Definition. For a directed graph G, let v G V (G) and M C V(G), and let ed(v,M) denote the 
number of edges from v to M. For parameters < a, /3 < 1, we say M is internally dense if for all 
v G M, ed{v,M) > /3\M\, and is externally sparse if for all u £ M, ed(u,M) < a\M\. A directed 
{a, ft) -cluster is a set that is both internally dense and externally sparse. 

Note that the directed form of the clustering definition depends on only outgoing edges. This cor- 
responds to situations where cluster membership depends only on outgoing intent (for example, online 
social networks where users may "subscribe" to other users). While we will work in the undirected CGA 
model using the undirected definition of clustering, our results hold as well for the directed CGA model 
using the directed definition of clustering. 
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3. Current Work 

Our goal is to establish a size threshold of (Inn) 5 for the existence of clusters in both the directed 
and undirected versions of the CGA model. We will work in the undirected version of the model, but all 
statements hold with trivial modifications in the directed model. Our main theorem is stated as follows: 

Theorem 1. Let G be a graph chosen according to the undirected CGA model. Then for a//0 < a, fi < 1 
and e > 0: 

a: [etm* = There a.a.s. exists (a, j3)-clusters of size larger than (lnn) 5 ~ e in G. Moreover, 

there exists a constant 7 = 7 (a, b, c) > such that for each h satisfying log b m* < h < 

(l ~ e ) lJ WT' tliere aas - exists at least (In"-) 76 '' complete (a, ^-clusters of size b h . 
b: There are a.a.s. no (a, j3) -clusters with more than (lnn)2 +e vertices. 

We will try to give some intuition towards why such a threshold might occur. It turns out (see Lemma 
[3j) that complete sets of increasing (i.e., non-constant) size tend to be externally sparse; we will attempt 
to intuitively justify why sets below the threshold are internally dense. 

To give intuition, we therefore make the following simplifications. First, given a set M of fixed size b h , 
M is more likely to be internally dense if its height is small, because short-range edges are more likely to 
occur. In the most extreme case, M has height h and size b h , and forms a complete set. For now, we 
will consider only sets of this form. Second, instead of considering whether M is internally dense, we will 
examine the probability of the stronger event of M being a clique. Finally, let us pretend all vertices in 
M are at height h from each other, so that the probability of any edge inside M occuring is c~ h . Note 
that this last simplification is not so extreme: for any vertex v G M, \M\ of the vertices in M (that 

is, a large fraction) are height exactly h from v; of the remaining ^ vertices, most have height close to 
h. 

Since there are ( 6 2 ) potential edges in M, the probability that M is a clique is therefore at least 

(c- h )^ >c~ hb2h 

There are disjoint complete sets of height h, so if we let X denote the number of complete cliques 
of height h, we have that 

E[X] > ^ 

(3.1) « exp (inn - hb 2h ^j 

(Of course, this is not really correct due to the simplifications we have made, but turns out to be close 
enough to give the right asymptotic value.) For the expected number of cliques to be growing with n, we 
require the positive term in the exponential in (j3.1|) to be growing faster than the negative term. Suppose 
the height is given h = a ln ^ b n for some constant a, so that the size of M is b h = (lnn) a . Then p. II) 
shows that the expected number of cliques will grow precisely when a < |, the value of our cluster size 
threshold. 

Of course, the preceding justification considers only complete sets, and ignores many details such as 
external sparseness. Many other types of sets may exist, and it is natural to suspect they could form 
clusters with sizes larger than (lnn)2 +e . Much of the work in proving part b) of Theorem Q] is in showing 
that this is not the case. 

Let us try to give intuition for why part b) of Theorem Q] should be true; that is, why clusters larger 

than (lnn)2 +e should not exist. As noted, the sets most likely to be internally dense are those with small 
heights, because the probability of edges is higher. In this simplified explanation, we will consider only 

sets of this form. Let S be a complete set of height h = (5 + e) so that S has b h = (lnn)2 +e 

h h ^ 

vertices and ( 6 2 ) w (lnn) 1+2e possible edges. Of these edges, only fc^'J) = o(lnn) have height | or 
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less. Ignoring these edges, which are negligible in number, the remaining edges occur with probability at 
most c~2 . This turns out to be small enough to show that a.a.s. there are no complete sets like S with 
more than Inn edges. 

That is, each complete set S of height h = (| + e) ln 1 ^ n fe n does not contain many edges. Given a set 
M, edges from M H S to M \ S occur with low probability, because the height between these two sets is 
high. If the intersection M n S is large enough, then M also cannot have many edges contained within 
S, because S itself does not have many edges. On the other hand, if M does not overlap any complete 
set S significantly, then M n S is small enough to be ignored. In other words, if M is large enough, then 
M cannot be internally dense, and is therefore not a cluster. 

Although this explanation gives the intuition behind the core idea of the proof of Theorem [T] the actual 
proof is much complicated, requiring multiple steps to complete. 

Our result holds as well in the directed version of the CGA model using the directed definition of 
clusters. This version of the theorem is stated as follows: 

Theorem 2. Let G be a graph chosen according to the directed CGA model. Then for all < a,/3 < 1 

and e > 0: 

a: Let m* = There a.a.s. exists directed (a, /3) -clusters of size larger than (lnn)2~ e in 

G. Moreover, there exists a constant 7 > such that for each h satisfying log b m* < h < 
(I ~~ e ) T^T' there a.a.s. exists at least (Inn) 76 complete directed (a, (3) -clusters of size b h . 

b: There are a.a.s. no (a, (3) -clusters with more than (lnn)2 +e vertices. 

Remark. In fact, the directed version is easier to work with than the undirected one, due to the added 
independence of having one potential edge in each direction between each node pair. A different argument 
than the one given here strengthens the number of clusters given in part a) of Theorem [21 we can show 
that a.a.s. there exists at least ^ complete directed (a, /3)-clusters of size b h . 

We give the proof of Theorem [TJ but not Theorem [2l The proof of Theorem [2] is nearly identical to 
that of Theorem [TJ and may be given with a few straightforward changes. The remainder of the paper 
will be organized as follows. Section 0] lists related work. Section [5] gives several results establishing the 
existence and non-existence of clusters of size less than (Inn) 5 , including a proof of part a) of Theorem 
[TJ Section [6] gives a proof of part b) of Theorem [TJ 

4. Related Work 

4.1. Random Models for Real-World Networks. Models for real-world networks have evolved in re- 
sponse to several important empirical observations. One of these is the small-world effect, the observation 
that network diameters are smaller than one might expect [52j El LT5j |4TJ . Watts and Strogatz [52] and 
Kleinberg [30] proposed models that add random edges to a regular network to reduce the diameter. 
Another key property is the scale-free property: degree distributions tend to follow a power Iaw0 This 
was first observed in the Internet graph [21], and later in phone call graphs [lj and the web graph 
Preferential attachment models JTTJJ [TBJ [TE] , in which a network grows by adding new edges with a pref- 
erence for attachment to nodes with high degree, give one approach to explaining this. Another approach 
is edge copying [291(31], in which newly added vertices copy the edges of existing vertices. Many other 
models exist (see e.g. (5) (5TJ [37l [TTl [28l [19] , among many others). 

In 2005, Leskovek, Kleinberg, and Faloutsos [34] observed that average node degree increased polyno- 
mial^ as the network grows. Prior to this, models had assumed a constant (or possibly logarithmically 
increasing) average node degree. They proposed two models with densification: the Community Guided 
Attachment (CGA) model used in this work, and a "forest fire" model similar to edge copying. Another 
approach for densifying models is based on Kronecker graphs [36] [38]. Lattanzi and Sivakumar [32] 



That is, the proportion of nodes with degree d is proportional to d » for some constant £ > 0. 
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recently gave a model based on affiliation networks, in which each node is affiliated with some number 
of "societies". 

4.2. Clustering. Detection of clusters is greatly important in sociology, biology, and computer science. 
Fortunato [23] gives a good review of some of the many hundreds of published works on the topic. 
Depending on the application, the definition of what constitutes a cluster can vary greatly. Empirical 
studies show that clustering is present in real-world networks [521 LXll [35] , and that these clusters often 
overlap [47, 4]: that is, a single node in a real-world network may be part of multiple clusters at once. 

Most popular approaches to clustering (e.g. [JZlEniHElESlS]) do not allow overlapping clusters. The 
most popular approach to overlapping communities is the clique percolation method [3, 22[[33]. In this 
method, two &;-cliques overlap if they share k — 1 vertices. A k-clique community (cluster) is the union of 
a /c-clique with all other fc-cliques that overlap it. One problem with this approach is that it is not clear 
initially which value of k should be chosen. Additionally, it presumes the existence of many A;-cliques, 
which may not be the case. Mishra, Schreiber, Stanton, and Tarjan's [42] (a, /3)-clusters (the clustering 
definition used in this work) avoids these problems by instead parameterizing the fraction of edges that 
should be present inside the cluster (the parameter j3). Additionally, they introduce the notion of external 
sparseness (the parameter a). Other approaches to overlapping clusters exist (e.g. Q21 [20l [54] 48]). 

Investigations into the size of clusters in real-world networks show that the tail of the cluster size 
distribution may follow a power law [47l @9l [45) . In other words, the relative sizes of the larger clusters 
in a network follow a certain distribution; unlike the current work, no observation is made about size 
relative to the overall size of the network. Recently, Leskovek, Lang, Dasgupta, and Mahoney [35] found 
empirical evidence for the existence of a size threshold for the "best" clusters in the network: beyond that 
size, the quality of clusters declines. This is also discussed in Section [TJ 

Despite the importance of clustering and the proliferation of random models for real-world networks, 
we are aware of no work that studies clustering analytically in any random model for real-world networks. 

5. The Existence of Small Clusters 

To prove the existence of clusters, we must establish that both external sparseness and internal dense- 
ness occur. We begin with external sparseness. Let h* = (| — e) lr \l^ b n ■ This value is important because 

the number of nodes in a complete set of height h* is equal to b h * = (lnn)2~ e , the bound in Theorem 
[T]a). Let M be a set with height h < h* and size m. Recall that S (M) is the complete set of height 
h containing M, and S(M,h*) is the complete set of height h* containing M. For M to be externally 
sparse, we require three events to occur, defined as follows: 

• Ei (M): The vertices of S(M) \ M must satisfy the external sparseness property with respect 
to M. That is, VmG5\ M, e(u, M) < am. 

• E2 (M): The vertices of S(M,h*) \ S(M) must satisfy the external sparseness property with 
respect to M. That is, Vu G S (M, h*) \ S(M), e(u, M) < am. 

• E3 (M): The vertices of G\S (M, h*) must satisfy the external sparseness property with respect 
to M. That is, Vu € G \ S (M, h*), e(u, M) < am. 

When the set M is clear, we will sometimes denote these events as Ei, E2, and E3, without the 
parenthetical argument. Together, Ei n E2 n E3 forms the event that M is externally sparse. This 
division will be used to show in steps when these different parts of the external sparseness property occur. 
In particular, this lessens the problem of dependence between sets. To see this, let M\ and M2 be sets 
such that S (Mi,h*) and S(M2,h*) do not intersect. Because these sets do not intersect, the events 
Ei (Mi), Ei (M2), E2 (Mi), and E2 (M2) are all independent; the only events with dependence between 
each other are E 3 (Ml) and E 3 (M 2 ). 

The following lemma establishes when E2 can occur: 

Lemma 3. Let m* = ^Jl anc j h* = (| - e) Then: 

a: There are a.a.s. no externally sparse sets of size smaller than m* . 
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b: If M is a set of size m > m* and height h < h* , then there exists a constant a > such that 

Pr(E 2 (M))>a. 

The proof of Lemma [3] may be found in the appendix; a) follows from a first moment argument, and 
b) comes from the application of a concentration bound. 

Because of the dependence problem, dealing with E3 is more complicated. To give intuition, consider 
the following scenario. Suppose G is partitioned into complete sets of height h*, and from each of these 
sets we choose at most a single subset of size m. Call this resulting collection of subsets M.. Any single 
one of those subsets is likely to have E3, because the probability of edges from that set to any of the 
other sets is small. In fact, by dealing with dependence appropriately, we can show that a.a.s. at least 

(5.1) min(|A4|,(lnn)^ (m-m * ) ) 

of the sets in A4 have E3 simultaneously. The proof of (j5.ip may be found in the appendix. 
We use (j5.ip and Lemma [3]to prove the following corollary: 

Corollary 4. Let m* = and let h = be a fixed, constant value. Then: 

a: For m < m*, there are a.a.s. no clusters of size m. 

b: For each m such that m* < m < b h , there are a.a.s. at least (Inn) 4In6 ( m-m ) clusters of size 
m and height h. 

Note that Corollary 0] establishes a sharp size bound on the existence of small, constant-sized clusters. 
We give the proof of [4) 

Proof. Because a set must be externally sparse to be a cluster, part a) follows directly from Lemma [3b). 
To show part b), let M be a complete set of height h and size m > m* , and let D denote the event 
that M is internally dense. Because there are only a constant number of vertices in M, M has D with 
constant probability. Similarly, there are only a constant number of vertices in S(M) \ M, so M has Ei 
with constant probability. By Lemma Ob), M has E2 with at least constant probability. Hence, there is 
a constant q > such that Pr (D n Ei n E 2 ) > q. 

Now, partition G into n/b h * sets of height h* = (i — e) lr \^ b n , and from each set choose a complete 
set of height h. Because each set lies in a different complete set of height h* , each set has D n Ei n E 2 
with probability at least a independently. Let those sets that have D n Ei n E 2 form the collection 
of sets Ai. The expected size of M. is qn/b h * = £1 (n 3 / 4 ), and a concentration bound (see appendix) 

shows that a.a.s. \M\ > n 3 / 4 . Applying (jBTTl) , we see that at least (lnn)^f ( m ~ m *) f tne sets j n 
are clusters. □ 

We are now ready to show how ()5.ip proves part a) of Theorem [TJ which we now restate: 

Let m* = There a.a.s. exists clusters of size larger than (Inn) 5 " 6 in G. Moreover, 

there exists a constant 7 = 7 (a, b, c) > such that for each h satisfying log fe m* < h < 
(l — e ) Tnir' there a.a.s. exists at least (lnn) 7& complete clusters of size b h . 

Proof. Note that the second part of the theorem statement implies the first, because a complete set of 

height h* = (± - e) has b h * = (lnn)5" e vertices. Let M be a complete set of height h. We will 

first examine the event D that M is internally dense. 

In fact, let us consider the stronger event of M being a clique. Each potential edge in M occurs with 
probability at least c~ h . so the probability of all ( b 2 ) < b 2h edges occuring is at least c~ hb2h . Hence, 
Pr(D) > exp(-ln(c)/i6 2/l ). 

Now, partition G into complete sets of height h* = (^ — e) ^jnp ■ From each such set, choose a single 
complete set of height h, and let these n/b h disjoint sets form the family A. Because these sets are 
complete, they automatically have Ei. Because they do not overlap and lie in different sets of height h*, 
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they will have D and E 2 independently. Lemma [3] b) implies they have E 2 with probability at least a for 
some positive constant a. Hence, for every Me A we have that 

Pr (D n Ei n E 2 ) > Pr (D) Pr (Ei) Pr (E 2 ) 

(5.2) > aexp (-\n{c)hb 2h ^ 

Now, let X denote the number of sets in A with DnEinE 2 . A fairly straightforward calculation (found 
in the appendix) shows that E[X] is asymptotically larger than and moreover that Pr (X < < 
exp (n _1//3 ). In other words, a.a.s. there are at least ^ sets in A with DflEifl E 2 . Let M C A be 

this sub-family of sets. (15.11) implies that at least (Inn) 41n6 ( b ~ m ) of the sets in A4 also have E3, and 
therefore are clusters. Recall that m* = ■^rJh, and let h m i n be the minimum integral value of h such 
that b h > m*; setting 

alnc (b hmin - m*) 
1 = 

4 In b bhmin 

guarantees that (Inn) 76 < (Inn) 4 in 6 ( b ~ m ) i and hence there are a.a.s. at least (mn) 7fc clusters of 
height h in G. □ 

6. The Non-Existence of Large Clusters 

The goal of this section is to prove part b) of Theorem [TJ that, for all e > 0, there are no clusters 
with more than (lnn)2 +e vertices. 

The intuition of the proof may be outlined as follows. A set of a given size is more likely to form a 
cluster if the height of that set is small. More generally, a set is more likely to be a cluster if a large subset 
of the set has a small height, so that edges within that subset are more common. We will concentrate 
on ruling out these types of clusters. 

We use the term thick to capture this notion of a set with small height containing many vertices. We 
will consider the following two types of thick sets: 

Definition. 

• A short e-thick set is a set with height at most h e = (| + e) ln 1 ^ n fc n and containing at least 

1 I 6 

(Inn) 2 s vertices. 

• A tall e-thick set is a set with height at most and containing at least (lnn) 5+ 2 vertices. 

Note that the height h e is the height of a complete set with (lnn) 2+t nodes, as in Theorem [T] b). 

Let Q be a set of size q > (lnn)2 +e . There are two cases: either Q contains no tall e-thick sets, or 
it contains at least one such set as a subset. Existence of clusters in the former case is easy to rule out: 
given any vertex v € Q, we must have at least /3q edges from v to other vertices in Q for Q to be a 

lie 

cluster. Since there is no tall e-thick set, at most (Inn) 2 2 = o(q) of these vertices are "close" — that is, 

within height ^rrjp of v. Even assuming all the edges from v to the close vertices exist, we must have 
at least ^q edges from v to the far vertices in Q, which occur with low probability. 

The latter case is harder. Let us write Q = T U R, where T is a tall e-thick set, and R contains the 
other vertices. Given v G T, we must also rule out the possibility of there being many edges between v 
and T, which occur with much higher probability than edges between v and R. 

This is done by repeating the argument used above for tall e-thick sets on short e-thick sets. T will 
either contain a short e-thick set as a subset, or it will not. Again, the latter case is more easy to deal 
with, because most of the required edges for Q to form a cluster are of height h e = (\ + e) ll ]^ n fe n or 
more, and so are unlikely to occur. In the former case, we partition T = M U K so that M is the 

1 I 6 

subset of height h e with at least (Inn) 2 3 vertices, and K are the remaining vertices. We now have 
Q = M U K U R; a simplified form of the argument for this case is as follows. We will show that there 
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is some v G M such that e(v,M) < ^m, e(v,K) < ^k, and e(v,R) < §r, together implying that 

e(v,Q)<^q. 

However, some care is needed to make the asymptotics of this argument work. There are more than 
(") choices for the set Q; if we divide Q = M U K U R for each such set Q, there are far too many 
choices of the set M for a first moment bound to show directly that a.a.s. all sets M have the property 
we desire. For that reason, the argument will instead be given in the reverse order: first, we will show 
that a.a.s. all short e-thick sets have the property we desire. This will be used to show that tall e-thick 
sets also have a desired property. Finally, the result on tall e-thick sets will be used to prove the clustering 
threshold for all sets of size at least (lnn)^ +e in general. 

We point out the need to make similar arguments twice, once with a height of h e = (i + e) lr \^ b n , 

and once with a height of . It is possible to show directly that a.a.s. all short e-thick sets are not 

clusters, but the asymptotics of this argument will not work with a set of larger size. By splitting sets of 

height - up into both sets containing short e-thick sets and those that do not, this in turn implies 
that a tall e-thick sets are also a.a.s. not clusters. Again, the asymptotics of this argument will not work 
for a sets of larger size. Finally, we consider all sets with at least (lnn)2 +<E vertices. By splitting the class 
of such sets up into both those that contain tall e-thick sets and those that do not, we may finally show 
no clusters with at least (lnn)^ +t vertices exist. 

We begin by precisely stating the property of short e-thick sets that we are interested in, and showing 
when it occurs: 

Lemma 5. For all e such that > e > 0, a.a.s. for each short e-thick set M, there exists a set 
Mi c M such that \M%\ > § |M| and \/v G Ml, e(u,M) < f |M|. 

The proof of Lemma [5] may be found in the appendix. Lemma [5] implies that there are a.a.s. no short 
e-thick clusters. The stronger notion used here (that many vertices in M have at most f |M| edges) is 
necessary for later steps of the proof. 

The next lemma is the first step in showing tall e-thick sets (of height ) are not clusters: 

Lemma 6. For all e > 0, a.a.s. for every tall e-thick set T = M U K , where M is a short e-thick set 
and K is a set with \K\ > j\M\ such that K does not intersect S (M, h e ), the complete set of height 

h e =(± + e ) Man, there exjsts a set M<2 c M such that | Mjj j > 3 | M | and e M, e («, K ) < f \K\. 

The proof of Lemma[6]may be found in the appendix. Intuitively, Lemma[5]and Lemma[6]go together 
as follows: let T = M U K be a tall e-thick set containing a short e-thick set M as well as some other 
vertices K. Lemma [5] implies that there are not enough edges within M for T to be internally dense, and 
Lemma E] implies similarly that the edges from M to K are not sufficient for T to be internally dense. 
Taken together, this shows that a.a.s. there are no tall e-thick clusters: 

Lemma 7. For all e such that > e > 0, a.a.s. for every tall e-thick set T, there exists a set T' C T 
such that \T'\ > (lnn)^ + 3 andMv G V , e{v,T) < § \T\. 

Because it is required for the proof of later steps, Lemma [7] shows a property stronger than that of not 
being a cluster. The proof of Lemma [7] is found in the appendix, but is sketched as follows: 

Proof Sketch. First, suppose T contains a short e-thick set M. Adding vertices in 5 (M, h e ) \ M to M 
will preserve the property that M is a short e-thick set, so we may assume that M is maximal. In other 
words, we may assume the set T \ M does not intersect the compelte set of height h e , S (M, h e ). Set 
K = T\M, and Let m = |M| and k=\K\. By Lemma [5] there is a set Mi C M such that |Mi| > \m 
and W G Mi, e(v,M) < jm. If k < jm, then even if every edge from v to K exists, we will still have 
e (v,M U K) < | m + k < f m < ft for every v G Mi, and hence may choose T' = Ml. If k > 
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then by Lemma El there is some set M 2 C M such that |M 2 | > fm and Vu G M 2 , e(v,K) < f/c. 
Taking T' = M x n M 2 , it follows that |T'| > ^ and e («, M U if) < f m + f fe = f i for every u G T'. 

In the case that T contains no short e-thick clusters, every v & T has at most o(t) vertices that are 
close to v, and the rest are far away and therefore these edges occur with low probability. It is thus not 
hard to show with a first moment argument that T has the desired property. □ 

We are now ready to prove part b) of Theorem [TJ which we now restate: 

For all e > 0, there are a.a.s. no clusters with more than (mn)2 +e vertices. 
The proof will occur in two steps. First, we give the following lemma: 

Lemma 8. For all e such that > e > 0, there are a.a.s. no clusters Q = T U R of size at least 
(lnn)^ +e , where T is a tall e-thick set. 

The proof of LemmaO achieved using the first moment method, is found in the appendix. This lemma 
rules out the most likely type of potential cluster, leaving only sets with at least (lnn)2 +<E vertices that 
contain no tall e-thick subsets. We now prove part b) of Theorem Q] by considering this case. 

Proof. Clearly if the theorem holds for all e such that > e > 0, then it will hold for all e > 0, so we 
may assume e < ttjjtj. Let Q be a potential cluster of size q > (lnn)2 +e . By Lemma [8l we may assume 

Q contains no tall e-thick sets; that is, Q contains no sets of height at most h = - with at least 

z = (Inn) 2 + 2 vertices. Hence, if we subdivide the vertices of G into ^ sets of height h, Q must have 

fewer than z vertices inside each set. This implies that for all v € Q, at least q — z > (Inn) 2 + 2 vertices 
in Q are at height h or more from v; for Q to be a cluster, at least (3q — z > ^q edges from v to these 
distant vertices must exist. Let X v denote the number of edges from v to Q that have height more than 
h, and let Xq = | ^2 ve gX v be the total number of such edges in Q. If Xq < jq 2 , then it follows that 

at least one vertex v has X v < ^q, implying that Q is not a cluster. 

Since there are less than (|) edges in Q with height more than h, each occurring with probability at 
most c~ h , Xq is stochastically dominated by the random variable Bin ((2)i c_/l )> which has expected 
value {V)c~ h = o (q 2 )- A concentration bound (see the appendix) therefore shows that 

(6.1) Pr(x Q >^ 2 ) <exp(-^A). 

Now let X be the number of sets Q of size at least (lnn)2 +<E such that Xq > f \Q\ 2 - As noted above, 
it suffices to show that a.a.s. X = 0. Since there are less than ( n ) < exp (glnn) choices of size q for 
the set Q, we have 

E[X]< £ (;)pr(l Q >^)< £ exp^lnn-^V/,) 

g>(lnn)^ +£ q>(\nn)i +e 

Since h = 7 and q > (lnn)2 +e , it follows that glnn = o (q 2 h) and the term in the exponential is 
negative. Hence, it is maximized when q is minimized; that is, when q = (lnn)2 +<E . The above becomes 

E[X]< £ exp((lnn)l^-^(lnn)l^) 

g>(lnn)2+ e 

^ nex H _ ioi^ (lnn) ) 

which tends to zero as n — > 00. By the first moment method, this implies that a.a.s. X = 0. □ 
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7. Conclusions 

This work deals with the question: what form do clusters take in real-world networks? In this case, 
we considered the existence of (a, /3)-clusters in the CGA real-world random network model. We showed 

the existence of a size threshold of (Inn) 2 for the existence of such clusters. As noted in Section [T] the 
CGA model captures many of the properties observed in real-world networks, (a, /3)-clusters capture a 
particular notion of clustering in real-world networks, in which clusters are denser than their surrounding 
neighbourhood, and in which clusters may overlap. Therefore, the choice of model and clustering definition 
seem valid for approaching the motivating question. 

I 

Thus, it is interesting to ask the extent to which the clustering threshold of (Inn) 2 extends beyond 
the CGA model to random models for real-world networks in general. Can a threshold for (a, /3)-clusters 
be observed in real-world data? Can other models for real-world networks be shown to have thresholds 
for cluster size? Do such size thresholds exist for other notions of clustering? 

One goal of this work was to achieve our result analytically. Many real-world random network models 
are prohibitively hard to analyze, so simulation is often needed to establish the existence of desirable 
properties. We have avoided this approach and concentrated on analytic results. 

Turning now to our result, an open question is whether or not a sharp threshold for (a, /3)-clusters exists: 
we have not addressed the existence of (a, /3)-clusters in the range (In re) 2 where f(n) = o(l). 

Part a) of Theorem [1] shows the existence of complete clusters at each height less than (^ — e) lr \^ b n , 
but a more complete treatment might consider the existence of other (i.e. non-complete) clusters of this 
size. 



8. Appendix 

This section gives the full proof of the results in this work. We begin by introducing some probabilistic 
tools that will be required. 



8.1. Probabilistic Tools. We aim to characterize the asymptotic behaviour of G as the number of 
vertices n increases. Since G is only defined when n is a power of b, it is more correct to let H, the 
height of G, increase, and take n = b H . However, little clarity is lost when taking asymptotics in relation 
to n. 

The binomial random variable given by the number of successes over r independent trials, each suc- 
ceeding with probability p, is denoted Bin (r,p). 

The main probabilistic idea used is that of the first moment method: Let X be a non-negative 
random variable that takes integral values, with expected value -EpT]. If E[X] = o(l), then by Markov's 
Inequality, Pr (X > 1) = o(l). In other words, a.a.s. X = 0. This technique will be used repeatedly to 
establish that events a.a.s. do not occur. 

The following lemma gives a bound on the upper tail of a Binomial random variable: 

Lemma 9. Let X = Bin(n,p) be a binomial random variable. Let t > 1 and 1 < s = \tpn~] < n — 1. 
Then 



Pr (X > tpn) < — — ( n ) p s (1 - P y 
t-±\sj 



A proof may be found in [14J. We will use a simplified form of this. Since ( n ) < (^) S (which follows 
from Stirling's approximation), Lemma implies that if s > 2pn, then 
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Pr(X > a) < 2 [ n s JP S 

(8.1) ^ 2 (ir) s 

(8.2) < 2exp(s(lnn + l-lns + lnp)) 

The final tool we will make use of is a Chernoff-type bound from Janson [26J, which gives concentration 
bounds on a sum of independent Bernoulli random variables. 

Lemma 10. Let X = X\ + ... + X^, where the X$ are independent Bernoulli random variables with 
Pr (Xi = 1) = pi. Let ii = E[X] = j^Pi- Then fort>0 we have: 



Pt(X > n + t) < exp 

and 



t 2 



2(fi + 1/3) 



t 2 



Pr (X < u - t) < exp 

V 2 l l 

A proof may be found in [26J. 
8.2. Proofs. 

Lemma 3. Let m* = ^ and h*=(\- e) ^jzp. Then: 

a: There are a.a.s. no externally sparse sets of size smaller than m*. 

b: If M is a set of size m > m* and height h < h*, then there exists a constant a > such that 

Pr(E 2 (M)) > a. 

Proof. To prove a), first suppose M is a set of size m < m*. The event E (M) that M is externally 
sparse holds if there does not exist any vertex v G V (G)\M such that e (v, M) > am. Since m = O (1), 
we have \V (G) \M\ > ^ for large enough n. The probability of an edge from d to a vertex in M is 
at least p = c~ logi>n = n~ logbC . It follows that e(v,M) stochastically dominates the random variable 
Bin (m,p), and hence 

(8.3) Pr(e(u,M) > am) >p am . 

E (M) will hold only if all of the at least 2 vertices in V (G) \ M have am or fewer links, so we have 

Pr(E(M)) < (1 -p am )% 
1 



< <>x T ) j --np am 

(8.4) =exp (-l n l - amlo ^ c 

Because m < m*, n 1 ~ amXogbC goes to zero as n increases, so this probability is exponentially small. 

Now, we wish to show that a.a.s. for every set M of size less than m*, E(M) does not hold. Let X 
be the number of clusters of size smaller than m*; since there are at most ( n ) sets of size m, it follows 

1 \rn/ 1 

from j83D that 



(8.5) E[X) < £ fa) exp (-i 

m<m* v / v 
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For each m < m*, 1 — am\og b c > 0, so the term 



exp 



1 

~2 n ' 



l—am 1 



< exp [m Inn — — n 
o(l) 



l—am 1 



o(l), so by the first moment 



Hence, each of the O (1) terms inside (l8~5|) is o(l). It follows that E[X] 
method, a.a.s. X = 0. 
This proves a). 

To prove b), suppose m > m* , and recall h* = (| — e) ■ Let S = S (M) be the minimum 
complete set containing M. To show that E2 holds for M , we need to show that Vu G S (M, h*)\S(M), 
e(u,M) < am. We wish to choose a new complete subset S' that also contains M, and whose height 
b! is large enough that the vertices in S (M, h*) \ S' are likely to have at most am edges to vertices in 
M. More explicitly, for our choice of b! and S', let A be the event that Vu G S'\S(M), e(u, M) < am, 
and B be the event that Vn G S (M, h*) \ S, e(n, M) < am. Then the events A and B are disjoint and 
independent, and E 2 (M) = A n B. 

In particular, we will choose b! to be larger than some constant 5 = 5(a,b,c). If m (and hence h) is 
increasing with n then surely we have h > 5. In this case, taking S' = S(M) and h! = h, we have that 
A trivially occurs. If m and h are constant, then h! will also be a constant that is possibly larger than h. 
Since S' is of constant size, there are a constant number of vertices in S' \ S(M). Furthermore, each of 
these vertices will have at most am neighbours in M with some constant probability, because M is of 
constant size. Hence, A will occur with at least constant probability, say Pr ( A) > 7. 

Thus, it will suffice to show the event B occurs with at least constant probability. Given a vertex u of 
height j > b! from M, there is a uniform probability of an edge between u and a particular vertex 

be the number of such links. Since, j > b! > 5, by taking 5 = 5 (a, b, c) 



m, c 



in M. Let X u = Bin ( 

to be large enough, we can require that c~ J < ^. Hence, am > 2mc~i and it follows from (18. lh with 
s = am that 



Pr{X u >am) < 2 f^- 
\a& 

Now let Rj be the event that there exists a u G G of height j > b! from M such that X u > am. 
There are fewer than b> such vertices, so by the union bound, 

Pr(Rj) < b>2 ( — 

B occurs only if each of the disjoint, independent events R;, b! < j < h*, does not occur. Therefore 



Pr(B)>l-^Pr(Rj) 

j>h> 



(8.6) 

Now, it suffices to show that 



> 1 - 2e c 



which is true when 



2e c 



< 



(8.7) 



ti> 



4am (1 + In c) - In (c am - b) 
am In c — In b 
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holds. We will see that this holds for b! > 5 when 5 = S(a,b,c) is chosen to sufficiently sufficiently 
large. Because m is integer-valued and larger than m*, c am — b is bounded below by a positive number, 
and so for m = 0(1), each term in both the numerator and denominator of (j8.7|) are bounded. On the 
other hand, if m is increasing, then we will have for large enough m that 

4am (1 + In c) - In (c am -b) < 1 + In c 
am In c — In b ~ In c 

so that the right side of (|8.7p does not depend on m. 

Hence, (j8.6l) gives that Pr (B) > |, which suffices to prove b). □ 

The next lemma proves (j5.1j) : 

Lemma. Let m* = and h* = (| - e) Let M be a family of sets such that each set M G M 

is of size m > m* and height h < h* . Furthermore, suppose for every M\,Mi G M, the complete sets of 
height h*, S (Ml, h*) and S (M 2 , h*), do not intersect, and no edges between S (Mi, h*) and S (M 2 , h*) 
have yet been exposed. (For each M 6 M, we allow any number of internal edges in S (M, h*) to have 
been already exposed.) Then a.a.s. at least 

min(|M|,(lnn)tr^( m - m *) 
sets in M have E3. 

Proof. Consider a set M G M. so that m = |M|. Given a vertex n^Mof height j > h* from M, there 
is a probability c~ J of an edge between u and a particular vertex in M. Let X u = Bin (m, c~ J ') be the 
total number of such edges. Since c~i = o(l), am > 2mc~- ? and it follows from (18. ip with s = am that 

Pr (X„ ><"„)< 2 (-) ' 
Now let Aj be the event that there exists a u G G of height j > h* from M such that X u > am. 
There are (b — 1) < V such vertices, so by the union bound, 

(P \ am 
^) ■ 

E3 (M) occurs if and only if each of the disjoint, independent events Aj, j > h* , does not occur. 
Therefore, since m > m* implies that -J^ < 1, we have by the union bound that 

Pr(E 3 (M))>l - ^Pr(Aj) 

j>h+ 

= l-2e am Y (— Y 

V c am I 

j>h* V 7 

n am / b \ h * 

(8-8) >l- ^g 

J- ( c amJ 
1,* / \ am 

Now, considering the term e Qm = b in the numerator of (|8.8p . we have: 

b I ~r* ) = exp (am + /i* (ln6 — amine)) 

= exp am — e a- — - m — m ) mmn 

P V V 2 / ln& ; 

/ alnc \ 

< exp — - ( m — m Inlnn . 

V 2.5 mo / 
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Since 1 — (dl™) is a constant, we have from (18. 8\} that 

(oi In c 
-— — - (m - m*) In Inn 
O 111 

Because removing edges only increases the probability that E3 occurs, it is a monotone property. More 
precisely, let G\ and G2 be graphs on the same vertex set V and let E(G\) C E(G2)- For any set 
M C V, if E3 (M) holds in G2 then it also holds in G±. Hence, for two sets M\ and M%, Proposition 
6.3.1 in |7j implies that E3 (Mi) and E3 (M%) are positively correlated; that is: 



(8.10) Pr (E 3 (Mi) n E 3 (M 2 )) > Pr (E 3 (Mi)) Pr (E 3 (M 2 )) . 

Now, let Mi, M 2 , ... be any ordering of the sets in M.. We will expose the edges necessary for E3 (Mj 
using this order. (18.10P imples that for any i, 

Pr (E 3 (Mj) I n Kl E 3 (Mj)) > Pr (E 3 (M)) . 

The union bound and (j8.9p thus implies that for some value k, 

Pr(n l<fe E 3 (Mi)) > l-£Pr(-,E 8 (M)) 

i<fc 

> 1 — k exp ( — —z — r (?7i — Wl*) In In n 
*\ 31n6 v ; 

Hence, if 

1 ; alnc 

mfc< (m — m ) In In n 

3 mo 

then it follows that a.a.s. every set Mj for 1 < i < k has the property E 3 (M{). Taking k = 
min (\M\ , (lnn) ! ^^ rn ~ m *' , ) proves the lemma. □ 



In the proof of|H the expected number of sets with DnEiHE2 is qn/b h * = Q (n 3 / 4 ) . Setting t 



n 



2/3 



in Lemma [TU] gives that a.a.s. there is a subset M' C M of size at least Q (n 3 / 4 ) — n 2 / 3 = $7 (n 3 / 4 ) 
such that each set in M' has DnEifl E 2 . 

Next, we give the proof of Theorem [T^) in more detail, with the omitted calculations inserted: 

Theorem la. Letm* = There a.a.s. exists clusters of size larger than (lnn)2~ e in G. Moreover, 

there exists a constant 7 = 7 (a, b, c) > such that for each h satisfying log b m* < h < (\ — e) ln t 1 i n b n , 
there a.a.s. exists at least (In n) 7& complete clusters of size b h . 

Proof. Note that the second part of the theorem statement implies the first, because a complete set of 
height h* = (5 - e) i^i^ has 6 h * = (lnn)2~ e vertices. Let M be a complete set of height h. We will 
first examine the event D that M is internally dense. 

In fact, let us consider the stronger event of M being a clique. Each potential edge in M occurs with 
probability at least c~ h . so the probability of all ( b 2 ) edges occuring is at least 

-h\ ) > c -hb 2h 



Hence, Pr (D) > exp (- In (c) hb 2h ) . 

Now, partition G into complete sets of height h* = (| — e) 1 ^ n b " . From each such set, choose a single 
complete set of height h, and let these n/b h disjoint sets form the family M.. Because these sets are 
complete, they automatically have Ei. Because they do not overlap and lie in different sets of height h*, 
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they will have D and E 2 independently. Lemma [3] b) implies they have E 2 with probability at least a for 
some positive constant a. Hence, for every M G M., we have that 

Pr (D n Ei n E 2 ) > Pr (D) Pr (Ei) Pr (E 2 ) 

(8.11) > aexp (-ln(c)hb 2h " 

Now, the number of sets in M. with D n Ei n E 2 stochastically dominates the random variable X 

n 
b h* 



21, 



Bin (t^f, aexp (- In (c) hb 2h ) J . Since h < h* , we have that 



E[X] > jppraexp {- In (c) hb 2 

= n exp ( In a — In (6) h* — In (c) hb 



>nexp(-/i* 2 o 2 ^ 



Since h* = (| - e) this becomes 



n 



2 /, \l-2e 



2 i 

In 6 



(In In n) (In n) 



Notice that E[X] is asymptotically larger than j-^. Setting t = n 2 / 3 so that — t > Lemma 
□5] gives that 



Pr (X < < exp 



2E[X] ) 
< exp ( tiT 1 / 3 



Inn 



In other words, a.a.s. there are at least ^ sets in M with DflEi n E 2 . Let M' C M be this 
sub-family of sets. Because each set in M 1 lies in a different complete set of height h*, and we have only 
exposed edges inside these complete sets of height h*, the conditions of Lemma ?? apply to M! . Hence 

at least (Inn) 41nb ( b ~ rn ) sets in M! also have E3, and therefore are clusters. 

It remains to find a constant 7 = 7 (a, b, c) > such that there are a.a.s. at least (lnn) 7b clusters of 



size b h . Recall that m* = ^rj^, and let h m i n be the minimum integral value of h such that b h > 



m 



setting 

alnc (b hmin - m*) 
^ 4 In b b hmin 

guarantees that (lnn) 7& < (Inn) 41n6 ( b ~ m \ and hence there are a.a.s. at least (Inn) 76 clusters of 
height h in G. □ 

Lemma 5. For all e such that > e > 0, a.a.s. for each short e-thick set M, there exists a set 



CM such that \Mi\ > | |M| anc/ Vw € Mi, e(v,M) < | |M| 



Proo£ Let m = |M|, and let 5 = S 1 (M, h e ) be the complete set of height h e = (| + e) ^tJ^ containing 
M. If 5 contains less than -^m 2 internal edges, then it follows that there can be no set M' C M of size 
\M'\ > f 1 such that Vv G M', e(«,M) > |m. Hence, it suffices to show that a.a.s. all complete sets S 
of height h e have less than ^m 2 internal edges. 

Consider a complete set of height j. It contains b sets of height j — 1, each of which contains W^ 1 
vertices. Hence, there are Q)^ 2 ^ -1 ^ potential edges of height j in a single complete set of height j. Let 
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the actual number of such edges be given by Xf, since each occurs with probability c~ J , we have that 



Now, let Xs be the number of edges in S. We wish to show that E[Xg\ = O (Inn). Since there are 
b he ~ J complete sets of height j in S, we have that 



E[X s } = f^b h ^E[X j ] 



(8.12) 



3=1 

b 



2b 

The sum in this expression is bounded as follows 



7=1 



7h if ^< c 
h 6 if b = c 



I jt~ (-) \n> c 

In the first two cases, this sum is small enough that (j8.12|) combined with the fact that b ht = (Inn) 5 " 
easily gives that E[X$] = O (Inn). In the case that b > c, we may rewrite (j8.12j) as follows: 

(8.13) =0 ^ 

Now, since h e = (§ + e) , we have b h '- = (lnn)2 +e and (l) he = (lnn)^^ 1 "^ ). Since 

e < it follows that (1 + 2e) (l - j^f ) < 1, so 



h < Q) fc, = Onn)< 1+ae J( 1 -{M) 



< Inn. 

Hence, from (j8. 13j) we have = O(lnn) holds as well when b > c, and therefore holds in all 

cases. 

We now use this bound on the expected value of Xs to bound the probability that Xs > §2 m2 = 
^(lnn) 1+ ^. Set t = (lnn)^ 4 " 4 "), so that E[X S ] +t < 2(lnn) 1+4 " < ^m 2 , since m > (lnn) 5+ i 
Applying Lemma [10] with this t, we have 



Pr j X 5 > ^m 2 ) < 



exp ^— - (lnn) 1+4 ^ . 



There are 77^- < n complete sets of height h e . Letting X denote the number of such sets with at least 



Jjm 2 internal edges, it follows that 



E[X] = nPr [Xs > ^m 2 

( 1 l+£ 

< exp ( In n — — (In n) 4 

This goes to zero as n — > 00, showing that a.a.s. X = 0. □ 
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Lemma 6. For all e > 0, a.a.s. for every tall e-thick set T = M U K, where M is a short e-thick 
and K is a set with \K\ > j\M\ such that K does not intersect S (M, h e ), the complete set of height 
h e =(i + e ) there exists a set M 2 C M such that \M 2 \ > f \M\ and W e M, e (v, K) < § \K\. 

Proof. Let m = \M\ and k = \K\, and let M and K be such that the conditions of the lemma hold. 
Since vertices in K are at height at least h e from M, each potential edge between any v € M and K 
occurs independently with probability at most c~ he . Hence, E[e{v,K)\ < kc~ ht < @r for large enough 
n. Setting t = 4-, by Lemma [10] we have that 

Pv(e(v,K) > ^\ <exp(-i 
(8.14) <exp(^-A fc 

Now, let M 2 = € M : e(v,K) < and let X m ,k = \M - M 2 \. We wish to show that a.a.s. 
for all valid choices of M and K, Xm,k < '- et PM,K denote the probability that Xm,k > x- From 
(18.14ft , it follows that Xm,k is stochastically dominated by the random variable Bin ^m,exp ^— -^k 
Therefore, by (18.2ft . setting s = ^ , we have that 



fm ( (m\ f3 \\ 

PM,K < 2 exp I — I lnm + 1 - In ^— J - —k I I 



< (>xp j — Yoo" m ^ 

Now, let S" be a complete set of height - ; we will count the expected number of tall e-thick 
sets T = M U K inside 5 such that X m ,k > We have that #[X S ] < J2m kPm,k, where this sum 
ranges over all valid choices of T = M U K inside S. Fixing the set M and a size we have that there 
are at most 



1 

, ,s , (Inn) 2 

\S\\ fb L id- 



x k J \ k 

sets of size k; hence, 



< exp (^k (In n) 2 ^ 



£[X S ] < ^ J>xp (ft ((Inn) * - -^-m)) 



M k 

- + - 

Since m > (Inn) 2 3 , the term in the exponential is negative, and hence is maximized when k is 

fm. Furthermore, the height of K is at most ^ ' 



minimized; that is, when A; = 4 m. Furthermore, the height of K is at most ^px— , so k < exp ( (Inn) 2 



Hence, 



J^exp ((lnn)^) exp ((lnn)5 - y^ m ^ 



M 

1 



M 

sr ( P 2 2 

^ eXP l"^uO m 



since (Inn) 2 = o(m). 



A THRESHOLD FOR CLUSTERS IN REAL-WORLD RANDOM NETWORKS 



19 



i 

(In n) g , / 1\ 

Now, within S there are b i»& - < exp I (Inn) 2 I complete sets of height h e , and each has at most 

(^"rn ) — exp(mlnlnn) subsets M of size m, so there are at most exp ^(lnn) 2 " + m,lnlnn^ < 
exp (2m In Inn) choices total for the set M within S. Hence, 



E[X S ] < exp (2m In Inn) exp 

m 

<2^exp( 



/3 2 m 2 
500 



1000 



since m In Inn = o(m ). Again, the term in the sum is maximized when m is minimized; since 

(lnn)2 + 3 <m< (lnn) 2 ^ , it follows 



(8.15) 



E[X S ] < (Inn 
< exp 



exp 
2000 



^— (lnn) 1+ - 



1000 



1 _l_ 2e 

(lnn) 1+ ^~ 



Now, there are fewer than n choices for the complete set S. Let X = J^s-^s denote the total 
number of tall e-thick sets T = M U K such that X m ,k > Then - ^[^l- Since 

Inn = o ((lnn) 1+ ^), ([815]) implies that = o(l), and hence a.a.s. X = 0. □ 

Lemma 7. For a// e sl/c/j t/jat > e > 0, a.a.s. for every tall e-thick set T, there exists a set T'cT 
such that \T'\ > (lnn)* + * andVv G T' , e(v,T) < § \T\. 

Proof. We consider two cases: either T contains a short e-thick set as a subset, or it does not. Let 

t=\T\. 

First, suppose T contains a short e-thick set M. Adding vertices in S (M, h € ) \ M to M will preserve 
the property that M is a short e-thick set, so we may assume that M is maximal. In other words, we 
may assume the set T \ M does not intersect the compelte set of height h e , S (M, h € ). Set K = T\M, 
and Let m = \M\ and k = \K\. By Lemma E there is a set M x C M such that |Mi| > |m and 
G Mi, e(v,M) < fm. If fc < ^p, then even if every edge from v to K exists, we will still have 
e (v, M U K) < ^m + k < f (m + fc) for every v G Mi, and hence may choose T' = M 1 . If k > 
then by Lemma El there is some set M 2 C M such that |M 2 | > |m and Vw G M 2 , e(v,K) < |/e. 
Taking T' = Mi n M 2 , it follows that |T'| > § and e («, M U A") < § m + § k = 1 f for every u G T'. 

Otherwise, suppose T contains no short e-thick sets. That is, each set of height h e = (| + e) ^jn^ 

can contain at most (Inn) 2+5 < lit vertices. For any v G T, there are at least ^-t vertices in T with 
height more than h e from v. Let X v denote the number of edges from v to T of height more than h e . 
If X v < jt, then certainly e(v,T) < ^t. Now, set Xt = 5 E«eT ^ to ^ e tne tota l number of edges 
of height more than h e in T. If X T < j^t 2 , then by a counting argument there must be a set T' C T of 
size at least | such that Mv G T", X v < jt. Hence, it suffices to show that a.a.s. for all T, Xt < j^t 2 . 

Since there are less than (*) edges of height greater than h € in T, each occurring with probability 
at most c~ hc , Xt is stochastically dominated by the random variable Bin ((*) , c~ hE ) . Since (X)c~ he = 
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o (t 2 ) ,we may apply (18. 2D with s = j^t 2 , giving 



I V | X T > LA < 2 exp (Le L (2) + 1 - In ) h, In r 



( /Sine, 2 
< exp l^-^-M 

Finally, let X denote the number of sets T such that Xt > ^t 2 . Then since there are at most 

exp ^t(lnn) 2 ^ sets T of size t contained within a single complete set of height ^-^-5 — , 
and there are less than n such complete sets, we have that 

E[X]<n exp(t((Inn)5-^M)) 



Since t > (lnn) 2 + 2 , we have that (Inn) 2 = o(th e ). Thus, the term in the exponential is negative 
and decreasing in t, so it is maximized when t = (lnn) 2 + 2 . j ne above becomes 

E[X] < n ^ exp ^(lnn)5+i f(hxn)^ - (lnn) 2 ^ 2 " hX) 

t=(lnn)2 + i 

< exp ^2 Inn — (lnn) 1+<E 

Hence E[X] = o(l), showing that a.a.s. Xt < jf-t 2 for every set T. □ 
Lemma 8. For all e such that > e > 0, there are a.a.s. no clusters Q = T U R of size at least 



(Inn) 2 , where T is a tall e-thick set. 

1 

) 

■ In b 

1 

I 

In b 



Proof. Let t = \T\ and r = \R\. T is contained within the complete set S ( T, ( lnn ) 7 \ and adding 



vertices from S ( T, ^¥ 2 ) \ T to T will preserve the property that T is a tall e-thick set. Hence, we 



may assume T is maximal; that is, that the vertices in R are at height at least - from the vertices in 

T. By Lemma [3 there exists a set V C T, |T'| > (lnn)5+f, such that e(v,T) < ft for every v G T'. 
For TU-R to be a cluster, we must have that e(v,TU R) > (3 (t + r). Hence, for v G I", it follows that 
e(v,R) > + /3r. This implies first that r > and second, that we must have e(v,R) > fir for 
every v G T". The remainder of the proof will show that a.a.s. no sets T U R satisfy this condition. 

Since v G T' is height at least ^rgy- from each vertex in R, e(v,R) is stochastically dominated by 

the random variable Bin (r, exp (— r^f (Inn) 2 J J. By (18. 2D with s = j3r, we have 



Pi(e(v,R) >pr) <2exp (fir (l - ln/3 - (lnn)M^ 

/ /31nc , .1 

< exp — -r (In n) 2 

\ 2 mo 

Since the edges of T' C T are independent of the edges between T and i?, this is true independently 

lie 

for all vertices v G T". Since \T'\ > (Inn) 2 + 4 , the probability p T that e(v,R) > (3r holds for all v € T' 
is therefore bounded: 
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Pt < Pr(e(v,R) > /3r) |r| 

/ ftlnc 1+ e\ 
- eXP V2bT6 r ^ ' )' 

Now, let X denote the total number of clusters of the form TUR. There are at most (™) < exp (r In n) 
sets of size r, so 

E m * £ £ (% 

T r -il V J 

' ~ 2 

^EE ex P ( r ( lnn_ 2lnT (lnn)1+f )) • 

The term in the exponential is negative, and therefore maximized when r = For a fixed size t, 

i 

there are less than n ( 6 t ) < n exp ( t (Inn) 5 J choices for the tall e-thick set T, and hence 



E[X] < J> exp (inn - ^ (lnn) 1+ i)) 

< J2 exp ^ (Inn) 5 - ^Jj^ (lnn) 1+ ^ . 
t=(lnn)2 

Since i(lnn) 2 = o (lnn) 1+4 "^ , this exponential term is decreasing and hence maximized for the 

i 

minimum value of t; that is, for t = (In n) 2 . Thus 



exp(lnn-^^(lnn)f + t^ 



t=(lnn)2 



lnc 



3 , e 



< n exp ( — - (In n) 2 ' 4 

v V 10 In 6 v ; 

Since Inn = o ^(lnn) 2 + 4 "^ we have that E[X] = o (1) , implying that a.a.s. X = 0. □ 
Finally, we fill a detail from the proof of Theorem Wp)- To get (16. ID , we use (18. 2D with s = ^g 2 , giving: 

Pr fx Q > ^q 2 ) < 2exp f ^g 2 fin ff) + 1 - In f I - fclnc 



< exp g /i 
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